Detecting Friendship Within Dynamic Online Interaction Networks 
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In many complex social systems, the timing and frequency of interactions between individuals 
are observable but friendship ties are hidden. Recovering these hidden ties, particularly for casual 
users who are relatively less active, would enable a wide variety of friendship-aware applications 
in domains where labeled data are often unavailable, including online advertising and national 
security. Here, we investigate the accuracy of multiple statistical features, based either purely on 
temporal interaction patterns or on the cooperative nature of the interactions, for automatically 
extracting latent social ties. Using self-reported friendship and non- friendship labels derived from 
an anonymous online survey, we learn highly accurate predictors for recovering hidden friendships 
within a massive online data set encompassing 18 billion interactions among 17 million individuals 
of the popular online game Halo: Reach. We find that the accuracy of many features improves as 
more data accumulates, and cooperative features are generally reliable. However, periodicities in 
interaction time series are sufficient to correctly classify 95% of ties, even for casual users. These 
results clarify the nature of friendship in online social environments and suggest new opportunities 
and new privacy concerns for friendship-aware applications that do not require the disclosure of 
private friendship information. 



I. INTRODUCTION 

For many online social systems, understanding which 
users are "friends," can be extremely useful, e.g., for tar- 
geted word-of-mouth advertising, product recommenda- 
tions, or detecting hidden social relationships. In some 
systems these relationships are provided by the users 
themselves, but even when the friendships are not ex- 
plicitly labeled, we can often still observe the timing and 
character of pairwise social interactions; for example, ci- 
tations between scientists appearances together in 
photos exchanges of tweets [1], emails [i^] or phone 
calls, playing games together, purchasing goods or ser- 
vices from businesses, etc. 

This raises the question of whether hidden or latent 
friendship ties can be inferred from such interaction data 
alone. For most online systems, this is complicated by the 
typically heavy-tailed distribution in the volume of inter- 
actions generated by different users: only a small fraction 
of users account for the majority of all interactions, pro- 
viding deep histories from which to learn, while most 
users are "casual," generating relatively little data. In- 
ferring latent ties from observable interactions promises 
to create both new opportunities and raise new privacy 
concerns for friendship-aware applications, e.g., in online 
advertising, where latent tie inference could facilitate so- 
cial marketing or better estimate product preferences, 
and online security, where it could uncover clandestine 
associations and activities. 
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For many computational social science questions, on- 
line multiplayer games are a rich but underutilized source 
of detailed, temporal interaction data. Past work in this 
area has shed light on competitive dynamics, social or- 
ganization, economic trading networks, and deviant be- 
havior JS-'T'] . Here we utilize a massive data set from the 
popular online multiplayer game Halo: Reach to inves- 
tigate the degree to which latent social ties can be au- 
tomatically identified from social interaction data alone. 
This data set contains details on more than 18 billion 
interactions among more than 17 million unique individ- 
uals across 700 million game instances, and serves as a 
model system by which to investigate the general ques- 
tion of detecting friendship in dynamic online interaction 
networks. 

From these data, we extract a temporal interaction 
network, in which two individuals are connected at time 
t if they shared a social interaction at time t. Here, in- 
teractions are playing a game together. We annotated 
each interaction with information about its character and 
magnitude, e.g., if it was a prosocial or antisocial inter- 
action. We then combine these data with the results of 
an anonymous online survey of the player population [Si] , 
including friendship and non-friendship labels for every 
individual in their time series. 

We then design and study nine statistical features 
representing temporal and cooperative-type interactions. 
Temporal features capture interaction patterns via peri- 
odicities, interaction volume, and the similarity in actions 
within the online system. Cooperative features quantify 
the prosocial character of the interactions such as direct 
and indirect assistance in scoring points, and "betray- 
als," the equivalent of scoring on one's own goal in the 
game, which indicates antisocial behavior toward the be- 
trayed individual. Although our cooperative features rely 
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on in-game data specific to Reach, the intention here is 
to capture the character or sign of the interaction 
and thus analogous features can hkely be constructed for 
other types of interaction data. For instance, the inter- 
action patterns in the game setting could correspond to 
check-ins with a location-based application; the coopera- 
tive features in the game could correspond to positive or 
negative comments on an online forum. 

From a social theory perspective, temporal features 
are expected to provide a weaker signal than coopera- 
tive ones because the former ignore the additional infor- 
mation explicitly contained in the latter. On the other 
hand, temporal features are more generalizable because 
they can always be derived from interaction time series, 
even when auxiliary information is unavailable, e.g., to 
study co-location, online social interaction, and commu- 
nication data [1, [lol - [l^ . In contrast to many standard 
data sets, our data allow us to directly compare the pre- 
dictive utility of these two types of features. 

The self-reported friend and non-friend labels from the 
online survey allow us to quantitatively measure the ac- 
curacy of our latent tie inference methods, and we take 
a supervised approach to learn which features perform 
well at this task. We also explore the way their per- 
formance degrades as we examine ties with progressively 
less data, which is an important concern for real-world 
applications. In general, we find that latent friendship 
ties can be predicted with over 95% accuracy when two 
individuals have had at least 10 interactions. This level 
of accuracy is achievable using cither the auto-correlation 
of interaction (temporal) or the number of assists (coop- 
eration) . The total volume of interactions between indi- 
viduals is also a good predictor, but it is less efficient than 
our two best features. These results clarify the nature of 
friendship in online social environments and suggest new 
opportunities and new privacy concerns for friendship- 
aware applications that do not require the disclosure of 
private friendship information. 



II. RELATED WORK 

Our work draws from three distinct lines of research. 
Most uses of online game data have focused on un- 
derstanding certain aspects of human social behavior 
in online environments. Examples include individual 
and team performance jlSi - tlQi] . expert behavior [l7| . 
honiophily |18| . group formation ITqII . economic activ- 
ity [20, 1 deviant behavior [22| . Most of this work 
has focused on massively multiplayer online role playing 
games (MMORPGs), e.g.. World of Warcraft, although a 
few have examined social behavior in first person shooter 
(FPS) games like Reach J^]. Relatively little of this work 
has focused on the structure of social networks. 

Some studies in social network analysis have considered 
human behavioral patterns in proximity and periodicity, 
e.g., questions regarding how the accumulation of inter- 
actions over time or physical proximity and geographic 



location can influence the induced social network struc- 
ture [1, 0, [m, [l3 . Few of these studies have focused on 
online interactions and the way they reflect underlying 
social ties. 

Another significant thread comes from the literature on 
link prediction. Several studies have considered the ques- 
tion of predicting links in future time steps based on the 
pattern of links in the past 12^] . Others have focused on 
predicting hidden or missing links when given a partially 
observed network [53, f25!| , and on how similarities in pref- 
erences and periodic behavior can predict social ties and 
their sign (friend or foe, trust or distrust) [2l.[9l. [T^l26l . l27| . 

Of particular relevance is a recent study that applied a 
similar approach to ours, with good results, to the more 
narrow question of distinguishing close and not close 
friends among a user's ties on Facebook [28]. Otherwise, 
very few studies have focused on the specific question 
and context considered here. A distinguishing feature 
of our study is the use of survey data, which provides 
us with "ground truth" labels of subjective friendship or 
non-friendship for observed interactions. By combining 
these ground-truth labels with the detailed data on pair- 
wise social interactions among all individuals, we directly 
explore the question of distinguishing mere interactions 
from genuine latent friendships. 



III. DATA AND SURVEY 

A. Game details 

Our interaction data are drawn from Halo: Reach, a 
popular online first person shooter game. It was publicly 
released by Bungie Inc., a former subdivision of Microsoft 
Game Studios, on 14 September 2010, and has generated 
more than 1 billion games since. Within the Reach sys- 
tem, individuals choose from among seven game types 
and numerous subtypes, which are played over more than 
33 terrain maps. Games can be played alone or with or 
against other individuals over the Xbox Live online sys- 
tem, and each individual on the system is identified by 
a unique "gamertag." Players may choose from among 
several "playlists," which subdivide the total player pop- 
ulation and which are based around specific game types. 

Once a playlist is chosen, individuals or small "parties" 
of players (typically friends) are grouped into teams by 
an in-game "matchmaking" algorithm. This algorithm 
is based on the TrueSkill system [2^, which attempts 
to create teams with equal total skill (subject to some 
practical constraints). When a competition is complete, 
by default all its players are placed in a new game to- 
gether, but all players or any subset may choose to reen- 
ter the matchmaking process to find new teammates or 
competitors. Both individual game and individual player 
summaries were made available through the Halo Reach 
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Stats API0 

Through this interface, we collected the first 700 mil- 
lion game instances (roughly 305 days of activity by 
17 million individuals). Among other information, each 
game file includes a Unix timestamp, game type label, 
and a list of gamertags. Each gamertag is associated 
with a particular team and a set of attributes indicating 
specific cooperative behavior actions amongst the indi- 
viduals, described below. This large database provides 
us with complete data on the timing and character of in- 
teractions between individuals but provides no informa- 
tion about which interactions are produced by friendships 
versus non-friendships. 



B. Survey 

We combine these in-game behavioral data with the re- 
sults of an anonymous online survey of Reach players Q . 
In the survey, participants supplied their gamertag from 
which we generated a list of all other gamertags that 
had ever appeared in a game with the participant. From 
this list^he participant identified which individuals were 
friends. □ We interpret these subjective friendship labels 
as ground truth. From these data, we constructed a so- 
cial network with links pointing from participants to their 
labeled friends. In our supervised learning analysis, both 
a labeled friendship and the absence of a label are treated 
as values to be predicted (i.e., we assume survey respon- 
dents explicitly chose not to label their co-player as a 
friend). Of the 965 participants who had completed the 
friendship portion of the survey by April 2012, 847 indi- 
viduals appear in our data (the first 305 days of play); 
this yielded 14,045 latent friendship ties and 7,159,989 
non- friendship ties. 

Survey participants were a sparse sample of a large 
population, and the resulting social network is a com- 
posed of mostly disconnected egocentric subgraphs. La- 
beled friendship ties are directed edges, while observed 
interactions are bidirectional. We note that because sur- 
vey participants were recruited through advertising on 
web fora related to Halo: Reach, they are a non-uniform 
sample of the general Reach population, e.g., they tended 
to be unusually skilled players 8]. Nonetheless, our sam- 
ple has sufficient variability to demonstrate the general 
applicability of our results across the player population. 



C. Interaction network 

We represent the set of pairwise interactions as an 
annotated temporal network, in which edges have end- 



points, exist at a specific moment in time, and are dec- 
orated with auxiliary information on the character and 
context of the interaction. Vertices in the network cor- 
respond to gamertags, and two vertices are connected if 
they appear in a game instance together at time t (time 
of day, in 10 minute intervals). Each vertex thus has 
a sequence or time series of interactions with other ver- 
tices. We then annotate each edge with information like 
whether the corresponding individuals were on the same 
team, what game type produced the interaction, and 
number of games played together at time t. The resulting 
network, derived from our complete game sample, con- 
tains 17,286,270 vertices, 18,305,874,864 temporal edges, 
and spans 305 days. The subgraph of interactions by our 
survey participants contained a total of 2,531,479 vertices 
and 665,401,283 temporal edges over the same period of 
time. 



IV. INFERRING FRIENDSHIP 

To recover latent friendship ties given only the time 
series of annotated interactions between pairs of individ- 
uals, we take a supervised learning approach. Using clas- 
sification trees and a logistic regression classifier 32| , we 
learn which features are best for predicting latent friend- 
ship ties. Of particular interest will be computationally 
lightweight models that could be applied on large scale 
systems. 

The self-reported friendship and non-friendship labels 
from the anonymous online survey serve as prediction tar- 
gets. We investigate the accuracy of our statistical fea- 
tures, divided into temporal and cooperative classes and 
considered individually, for predicting latent ties. Tem- 
poral features are derived explicitly from a time series of 
interactions, without regard to the character or context 
of those interactions. Cooperative features are derived 
from the auxiliary data and capture the degree to which 
an interaction is prosocial. In the construction of several 
features, we use the massive unlabeled data to derive sim- 
ple statistical expectations that are used to normalize the 
raw statistics. 



^ The API was active from September 2010 through November 
2012. API documentation was taken offline in September 2012. 

^ In the survey a friend is defined as a person known by the re- 
spondent at least casually, either offline or online. 



A. Temporal features 

Overall gameplay dynamics within the Halo: Reach 
system are highly periodic (Fig. [1]), with the peak online 
population on each day of the week occurring between 
the hours of 3:00pm and 6:00pm Pacific Standard Time 
(PST) and the minimum occurring near 4:30am. Since 
most players reside in the US and the majority of the 
US population is located on either the East or West 
coasts, the three hour window of peak play seems 
likely related to the coasts' three hour time difference. 
Furthermore, the peak period is roughly synchronized 
with the class schedules of secondary and post-secondary 
schools, where the majority of classes occur between 
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[O 3am 6am 9am 12pm 3pm 6pm 9pm 12am 

Hour Of day (PST) 

FIG. 1. Number of unique individuals ever seen at a given 
time of day (in Pacific Standard Time), across the 305 days 
spanned by the data, illustrating significant daily and weekly 
periodicities. 



the hours of SiOOam and 2:00pm. Finally, we observe 
a strong weekend effect, with Friday night game play 
rising to weekend levels, Saturday play remaining high 
and steady for the majority of the day and night, and 
Sunday play peaking relatively early and then tapering 
off after roughly 3:00pm. These regularities suggest 
several statistical features for capturing latent friendship 
ties. 

Pair autocorrelation. Pairs of individuals in Reach 
that are friends are known to play many more consec- 
utive games (12, on average, or about 2 hours of time) 
than non-friends (1.25, on average) 0. Thus, continu- 
ous interaction over a significant span of time is likely 
an indication of a latent tie, while more intermittent in- 
teractions likely indicate a non-friend tie, given the large 
population of non-friends available to play at any time. 
The expected diurnal and weekly cycles observed in the 
data will modulate these behaviors, and a reasonable ap- 
proach for their quantification is via interaction period- 
icity. Let 

nx,y{i) = l{a; and y play together at time t} (1) 

represent the time series of binary interactions between 
individuals x and y, where 1 indicates an interaction at 
time t and indicates no interaction. If x and y are 
friends, we expect Ux^yit) to exhibit stronger periodicity 
than for non-friends. This expectation may be quantified 
as the autocorrelation of the time series n^ ylt) over all 
time lags r: 

ACx,y = nx,y{t)nx,y{t - t). (2) 

T t 

If nx,y{i) is generated by a non-friend pair, AC^^y should 
be small because these individuals do not interact 



regularly. On the other hand, if nx^y(t) is generated by 
a friend pair, we expect AC^.y to be large. 

Pair frequency. A corollary of our previous argument 
is that friend pairs will likely produce a greater number 
of interactions over a fixed time period than non-friend 
pairs. Let be the total number of games played by 
individual x, and 

t 

be the number of those games played with individual 
y. The fraction N^.y/N^ thus captures the share of x's 
interactions that involve y. Because we expect friend 
pairs produce more interactions than non-friend pairs, 
this fraction should be relatively large for a latent friend 
pair, even if the total number of x's interactions, N^, is 
small. 

Individual entropy. Recent research has shown that 
individuals who maintain diverse or unpredictable pat- 
terns in their daily schedules in the physical world tend 
to have larger numbers of friends, as quantified by an en- 
tropy measure [lo| . But, online environments differ from 
physical ones in important ways, being more flexible and 
offering fewer constraints on "large" movements. It is 
thus an interesting question whether a digital version of 
these entropy measures can predict latent social ties as 
well as its physical analog. 

Toward this end, we define entropy measures on an 
individual's schedule (when they interact), game type 
(in which game context do they interact), and combined 
schedule and game type. For a given individual x, we 
observe the series of x's appearances at "location" £ € C, 
where C represents the set of all possible locations. We 
consider three versions of this measure: (i) schedule en- 
tropy Ht{x), with locations as days of the week, (ii) spa- 
tial entropy Hs{x), with locations as Reach "playlists" 
(which subdivide the full population into groups want- 
ing to play a specific type of game), and (iii) the entropy 
Hs^tix) over all pairs of schedule and spatial locations. 

Mathematically, we compute a given entropy measure 

as 

Hcix)^-J2pi^,i)^ogp{x,£), (4) 

where p{x, £) corresponds to the observed probability of 
individual x at location £, i.e., the fraction of all obser- 
vations of X in which x is observed at location £. We 
expect the schedule entropy to quantify the diversity of 
an individual's interactions across time: individuals who 
typically play on Tuesdays (say, at 8:00pm to meet their 
friends) will have a lower entropy than those who play in 
more ad hoc fashions. Similarly, we expect the combined 
schedule-location entropy to capture regularities such as 
playing in one game environment on Tuesdays but in dif- 
ferent environments over the rest of the week. 
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For predicting friendships, we take the sum of the in- 
dividuals' entropies, i.e., Ht{x) + Ht{y), as opposed to a 
joint entropy measure. A low sum of entropy measures 
would suggest that both players have low diversity play- 
ing patterns, which need not be coordinated. A higher 
sum would suggest that at least one player of the pair has 
a more unpredictable schedule; however, knowing this is 
true for only one player is sufficient to suggest that other 
temporal signals might be more meaningful. An indi- 
vidual that plays sporadically but with a few regularities 
(e.g., consistently playing on Saturday mornings with the 
same set of individuals) suggests evidence of social coor- 
dination. A low entropy pair would then likely be either 
highly autocorrelated if they played on similar schedules, 
or exhibit very low autocorrelation if on different sched- 
ules. A rich class of temporal features lets us better de- 
scribe the temporal patterns exhibited b y th e players in 
our sample and test existing hypotheses [lOj . 



B. Cooperative features 
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FIG. 2. A classification tree found using all features except 
Ax,y. This tree only uses temporal features, and performs 
well: the error rate is 0.0013, which is significantly better 
than the naive classifier error rate of 0.0020. The out-of- 



sample AUG for this tree is 0.924 
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Our temporal features explicitly ignore the char- 
acter of the interactions. Recent work and previous 
results suggest that friend pairs interact differently 
than non-friend pairs, and features that capture these 
differences can be expected to be good predictors of 
latent ties [1,111]. 

Betrayals. One feature of Reach that differs from many 
other online social systems is the ability to commit an 
explicitly antisocial action, in the form of a "betrayal." 
These actions are equivalent to an "own goal" and re- 
sult in a penalty for the betrayer's team. A quirk of the 
method by which Reach places players into a game is that 
occasionally friends are placed on opposing teams. Past 
work has shown that when this happens, one team tends 
to experience an increased betrayal rate as friends on one 
team turn against their teammates to help their friends 
on the other team 

For a pair of individuals x and y, we capture this ten- 
dency by counting betrayals by x that help y, i.e., when 
X and y are on different teams. Let bx{t) count the num- 
ber of betrayals performed by x at time t. Our measure 
is then 

Bx^y = bx{t) l{x, y playing on different teams}, 
t 

(5) 

Direct assistance. During a game instance, individuals 
can provide direct assistance to each other in scoring a 
point. Like betrayals, this prosocial action can occur with 
or without deliberate coordination of actions. Because 
friend pairs are expected to exhibit greater frequencies 
of prosocial behavior toward each other, a large number 
of direct assists should correlate with latent friendship 
ties. 



Let (t) count the number of direct assists performed 
by individual x at time t. The total number of assists 
^x,y capture the volume of prosocial behavior on this 
tie, 

Ax^y = ax{t) l{x, y playing on same team}. (6) 
t 

Indirect assistance. Reach also allows an individual 
to indirectly assist another in scoring points, in which x 
drives a vehicle while y operates a vehicle-mounted gun. 
This behavior requires substantially more coordination 
than direct assists, and thus may provide a more infor- 
mative measure of latent friendship. 

Let Vx(t) count the number of indirect assists at- 
tributed to X at time t. The total number of indirect 
assists from x to y, denoted V^^y, is 

Vx^y — '^^Vx{t) l{a;, y playing on same team}. (7) 
t 



C. Predicting latent friendships 

In our initial exploration of the predictability of la- 
tent ties from interaction data, we use classification trees 
to gain intuition about which features or combinations 
thereof are likely to be predictive. For this data ex- 
ploration, the interpretability of classification trees is a 
strength, compared to, e.g., random forest^. Subse- 



^ To aid interpretation of the tree results, we normalize feature 
values by the average observed values taken from a uniform ran- 
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quently, we will consider the performance of individual 
features. 

For learning the classification tree, we divided our data 
into equally sized groups of individuals for testing and 
training. Cross-validation within the test set was used 
to control the tree's complexity, pruning branches that 
did not significantly improve the fit of the model. The 
resulting tree is highly compact, with only a few features 
being retained (Fig. [2|) . Repeating our analysis with dif- 
ferent subsets of the features and different training and 
test sets allows us to probe their relative importance and 
correlation structure. 

All of the resulting trees beat the baseline accuracy of 
a naive classifier. This baseline is in fact a significant 
barrier because the number of latent ties is a small frac- 
tion (0.2%) of the total number of ties we consider and 
we can naively score well by guessing that every tie is 
a non-friend. For this reason, we use the Receiver Op- 
erating Characteristic (ROC) curve and the Area Under 
the ROC Curve (AUC) 0, which gives the probability 
the classifier will rank a randomly selected positive case 
higher than a randomly selected negative case. 

At the level of feature classes, temporal features are 
most useful for correctly predicting friendship: when 
trained on all features, the best tree splits first on auto- 
correlation ACx.y, followed by splits on combined sched- 
ule and spatial entropy Hs^t{x, y), autocorrelation AC^.y 
(again), and normalized pair frequency N^^y/Nx- Similar 
trees are found when training across all features exclud- 
ing direct assists Ax^y^ or only temporal features: for all 
three feature sets (all features, all features except assists, 
and temporal features only), the final trees yield average 
AUC scores of 0.830, 0.833, and 0.834 respectively. This 
similarity in performance is unsurprising considering the 
importance of temporal features (Fig. 

Surprisingly, fitting the model with just the coopera- 
tive features yields classification probabilities nearly as 
high (average AUC=0.789). This tree splits first on di- 
rect assistance Ax^y, in agreement with our expectation 
that latent friendship ties produce greater volumes of 
prosocial interactions than non- friend ties, followed by 
further splits on Ax,y and indirect assistance Vx^y over 
certain ranges of Ax^y The fact that autocorrelation 
rather than direct assistance appears in the full model 
suggests first that autocorrelation is a more reliable indi- 
cator of latent friendship, but also that direct assistance 
may be capturing similar information. We test this idea 
by first training a classification tree using all features ex- 
cept autocorrelation ACx.y As expected, this tree splits 
first on high Ax^y^ with the remaining structure being 
nearly identical to the models trained on all features or 
a subset, but substituting Ax^y for ACx,y The average 
out-of-sample AUC for this set of trees is 0.800. 



The structure and simplicity of the fitted trees suggest 
an underlying signature of friendship in the patterns of 
observed interactions. Specifically, highly periodic inter- 
actions are strongly indicative of friendship because they 
require nontrivial levels of social coordination within the 
online environment. That is, friends must, and do, ac- 
tively seek out each other in order to interact. Inter- 
estingly, although autocorrelation is highly predictive, 
combining it with spatial and schedule entropy reveals 
some subtleties in social interactions. When given all 
features or only temporal features, high autocorrelation 
ACx,y with high spatial and schedule entropy if^.f (a;, y) 
yields a good predictor of latent friendshipso Entropy 
features by themselves are not particularly useful, but 
they do become predictive for high values of autocorre- 
lation. Players with shared, low diversity playing habits 
(and thus low individual entropy levels) can appear in 
the data as synchronized, even without any social co- 
ordination. Entropy measures then allow us to identify 
non-friends who have autocorrelated schedules. 



D. Lightweight predictors of friendship 

These results suggest that individual features alone 
may perform well at predicting latent friendships, 
and such features would make good computationally 
lightweight predictors that could realistically be deployed 
on a large-scale system. 

We explore this possibility using logistic regression 
to build single-feature latent tie classifiers and mea- 
sure their performance using AUC. We divide our data 
into training and test sets using random partitions such 
that test and training sets are of equal size. 1^21 Yig- 
ure[3]shows the ROC curves for each of these individual- 
feature models for predicting latent friendships, and the 
corresponding models are summarized in Table HI Re- 
markably, the two most predictive individual features — 
autocorrelation ACx,y (temporal) and direct assistance 
Ax.y (cooperative) — achieve near-perfect classifications, 
with AUCs of 0.99 and 0.98 respectively. To provide a 
comparison, we note that another method inferred friend- 
ship between graduate students with 96% accuracy using 
a single temporal-spatial feature 12]. Both of our single- 
feature models are computationally lightweight and could 
thus potentially be deployed on a large-scale system to 
automatically infer latent ties for friendship-aware appli- 
cations. 

All of the remaining individual features perform 
more poorly, indicating that none would perform well 
as lightweight predictors in a real- world environment. 
Naively, we expected the volume of interaction Nx^y, and 



dom sample of roughly 1 million players. For each of the players 
in the random sample we compute feature values for each player 
they interacted with in the data. 



* Note that while the classification tree only classifies friends and 
non-friends, the numbers observed, shown in the leaves of Fig- 
ure (2] indicate the maximum likelihood estimates of friendship 
probability at the leaf. 



feature 



AUG 



pair autocorrelation 

normalized pair frequency 

pair frequency 

loc. entropy 

sched. entropy 

sched. and loc. entropy 



N^,y/N^ 0.1390 0.00160 

N^,y 0.0390 0.00050 

Hs{x) 1.8270 0.04300 

Ht{x) 1.5860 0.08100 

Hs,tix) 2.5920 0.09600 



30.000 < 0.001 0.99 

86.875 < 0.001 0.76 

78.000 < 0.001 0.76 

42.488 < 0.001 0.65 

19.580 < 0.001 0.50 

27.000 < 0.001 0.61 



direct assists 
indirect assists 
betrayals 



A^^y 0.1230 0.00100 
K,y 1.3170 0.01700 



123.000 < 0.001 0.98 
77.470 < 0.001 0.70 
48.590 < 0.001 0.64 



TABLE I. Coefficients, 9, standard deviations, a, Z-scores, \Z\, p values, p, and AUG values for logistic regression models fitted 
to each individual feature for all friends and non-friends. AUG values of 0.5 correspond to a baseline random classifier. 




False positive rate 

FIG. 3. ROG curves for logistic regression models on individ- 
ual temporal and cooperative features. 



the fraction of that volume assigned to a particular other 
individual N^^y/N^, to be good indicators of latent ties. 
However, we find this not to be the case. Upon a closer 
examination of the mislabeled ties, we see that some la- 
tent ties spanned only a few interactions and this number 
was not significantly greater than the number of interac- 
tions with non-friends. Our autocorrelation feature is ro- 
bust to this phenomenon because even these low-volume 
friendship ties exhibit strong periodicity in the interac- 
tions they generate. 

Entropic features perform poorly alone because of in- 
sufficient diversity in location behavior within the popu- 
lation at large. That is, the number of interacting indi- 
viduals at any given time is large, while the number of 
"locations" is relatively small. As a result, both friend 
and non-friend pairs will often make similar choices about 
which locations to visit. Controlling for both time and 
space via Hs,t{x) provides a narrower filter to individ- 



uals' behavior but does not substantially improve per- 
formance. Furthermore, our entropy measure does not 
consider the alignment of the individuals' schedules. As 
we saw with the classification trees, it is only in com- 
bination with other features, like autocorrelation, that 
entropy becomes predictive. 

The failure of entropy features alone to perform well 
in Reach is interesting, and clarifies their success in ap- 
plications to physical locations [l^l- When the number 
of locations is large relative to the size of the population 
exploring them, the probability becomes very low that a 
non-friend pair will have similar distributions over loca- 
tions in time. As the number of locations shrinks rela- 
tive to the population size, this probability increases and 
eventually swamps the signal produced by friend pairs, 
which is what we observe in Reach. However, combining 
this signal with other features, like the autocorrelation, 
preserves some of its predictive power by mediating tem- 
poral effects with surprisingness, even in a system with 
densely occupied locations. 

The poor performance of indirect assistance is unex- 
pected, given that such behavior in Reach indicates a 
strong prosocial orientation and that direct assistance 
performs so well. Examining the mislabeled ties, we find 
that indirect assistance is not always possible in every 
interaction, i.e., in every game type, and even when it is 
possible, it is an uncommon event. These factors place 
tight constraints on its predictive power and the raw be- 
havioral data we study contain examples of labeled friend 
pairs that exhibit no indirect assistance, thus making it 
difficult to identify a discriminative threshold. 

Past work on friendship in Reach Q suggested that 
our betrayal feature (in which an individual betrays their 
teammates to help their friends on the opposite team) 
should also correlate with latent friendship. And in- 
deed it does: the average betrayal total (B^^y) = 6.27 
for friend pairs but only 0.5 for non-friend pairs. The 
significance of this difference is qualified by a substan- 
tially larger variance for friend pairs (cr = 29.12 versus 
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FIG. 4. (Left) AUG as a function of Nx for each temporal and cooperative feature. The accuracy of ACx,y and A^^y are 
robust to available individual information while the accuracy of Vx,y, Nx,y and N^.y/Nx increase with Nx. Entropic features 
remain relatively noisy regardless of Nx , see text for details. (Right) GGDF of Nx , number of games played, across all surveyed 
individuals. 



2.13), likely because many friends choose not to defect 
against their teammates, which lowers the discriminative 
power of this feature. 



E. Predicting friendships for casual users 

Achieving good predictions for the few users who pro- 
duce large amounts of interaction data is useful. How- 
ever, it is less useful if the performance degrades sub- 
stantially as we consider users with progressively fewer 
observations, i.e., the casual users who typically make 
up the majority of individuals in an online system. To 
understand how robust our features are to the amount of 
available information, we study the performance of each 
individual feature as a function of N^, the length of an 
individual's history. 

We grouped surveyed individuals into bins according 
to the number of games they completed ■ To provide 
a fine-grained look at individuals with short histories, 
where data are plentiful, and a coarse view of long histo- 
ries, where data are sparse (Fig. |4l right), we used bins 
of size 10 for < 100 and bins of size 100 for > 100. 

We then computed the average AUG and its standard 
error by creating equal sized training and test sets from 
10 random permutations of the data in each bin, and ap- 
plying the individual-feature models. Examining these 
predictors' performance as a function of data volume pro- 
vides some guidance for predicting friendships in data 
sets with large heterogeneities in data availability. Ad- 
ditionally, this test serves as a robustness check on our 
previous conclusions by implicitly considering the length 
of individual history as a feature. 

Figure H] shows the average AUG for each feature as a 
function of history length N^- Again two features, auto- 



correlation ACx,y and assists A^^y, are consistently accu- 
rate predictors across all values of N^. For the autocor- 
relation feature, this robustness indicates that pairs of 
friends interact more periodically than non-friends, re- 
gardless of their overall level of activity in the system. 
This signal is strong despite common individual sched- 
ules (e.g., weekend nights) that could potentially lead 
to artificially high autocorrelation between non-friends. 
Furthermore, even when an individual's data is sparse be- 
cause he or she has completed very few games (less than 
10), both autocorrelation and direct assistance have sur- 
prisingly strong predictive power, yielding average AUG 
values close to 0.98. 

Focusing on autocorrelation, the reason for its high 
accuracy at small history lengths Nx is likely due to the 
large number of individuals in the system at any one 
time. This very large pool makes the probability very 
low for interacting with the same non-friend individual 
more than a few times. In real- world systems with low 
thresholds for two individuals meeting by chance (e.g., 
colocation in highly constrained or small physical en- 
vironments), autocorrelation can be less discriminative 
and may require augmentation with other temporal or 
domain-specific features. Essentially, context can mat- 
ter: it is unlikely that everyone who frequents the same 
busy coffee shop on Monday mornings will be friends, due 
to the nature of that location, while it would be a good 
bet that many pairs of individuals attending the same 
weekly soccer practice would be friends. The large effec- 
tive capacity of an online system means that any signal 
from autocorrelation is likely to be significant. 

In their analysis of friendship and gameplay in Reach, 
Mason and Glauset showed that individuals who are 
friends tend to coordinate and cooperate in ways that in- 
crease their team's score and the probability of winning 
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the match [8|. The strongly predictive nature of direct 
assists A^^y that we observe corroborates this finding, 
and demonstrates that it holds over a wide range of N^- 
That is, even for casual users, counting these prosocial 
interactions is a reliable indicator of friendship because 
friends do indeed cooperate more than non-friends. 

Autocorrelation and direct assistance both maintain 
high performance across all sizes of N^;. The temporal 
features of raw and normalized pair frequencies N^^y and 
N^ y/N^ are less reliable predictors for small histories, 
but become more reliable as increases. For large his- 
tories {Nx > 400), both features reach AUG values of 
nearly 0.90. 

As we might have expected from our previous anal- 
ysis, the performance of spatial and temporal entropy 
features H^.tix), Ht{x), and Hs{x) do not improve as we 
accumulate more data. Similarly, we observe fairly weak 
improvements for indirect assists A^^y and betrayals B^.y 

The remarkable accuracy achieved by our two best fea- 
tures, autocorrelation of schedules and direct assistance 
(prosocial interactions), demonstrate that lightweight 
predictors can be reliable even when applied to individ- 
uals with heterogeneous amounts of data by which to 
estimate latent friendships. 



V. SOCIAL NETWORK INFERENCE 

Given the excellent performance and computational ef- 
ficiencjUof the autocorrelation of co-play feature, ACx,y, 
we use this lightweight predictor of friendship to infer the 
social network of the entire population of 17 million play- 
ers. For each pair of players in the interaction network we 
compute ACx^y, compare it to a threshold, which we ex- 
plain below, and then label the pair of players as friends 
if their ACx,y is greater than or equal to the threshold 
value. 



A. Threshold selection 

The survey respondents are a biased sample of 
Reach players 18:], being substantially more skilled than 
the typical player and investing roughly an order of 
magnitude more time playing than an average player. 
It is thus possible that the survey sampling bias has 
produced an oversampling or an undersampling of 
the tail of the degree distribution. In an attempt to 
control these opposing biases, we choose two thresholds, 
one to show what the network looks like if the survey 
respondents have less friends (undersampled tail) than 
the population, and one to show network structure if the 
respondents have more (oversampled tail). 



^ The autocorrelation function can be computed in 0(n log n) time 
using a fast Fourier transform. 
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FIG. 5. CCDF of actual and inferred degree distributions 
using only survey respondent data. 



Undersampled tail - To control for the undersam- 
pled tail bias we choose the AC^^y that minimizes the 
KuUback-Leibler divergence 



DKL{P\\Q)^Y.^n 



V0« 



P{^) 



(8) 



where P is the degree distribution of social network 
derived from the survey respondent data and Q is the 
degree distribution calculated by creating edges between 
players x and y if their ACx,y is greater than or equal to 
a chosen threshold. As shown in Figure [SJ this approach 
chooses ACx,y = 197 and produces an inferred degree 
distribution for the entire network of 17 million players 
that matches the density near the head of the actual 
distribution but with a heaver tail than the survey data. 
It is not clear that this threshold choice necessarily 
produces an abundance of false friendships, as players 
with many friends are unlikely to have reported them 
all due to the tedious and time consuming nature of 
providing this information via the survey. This hypoth- 
esis is supported by empirical research, which showed 
that self-survey respondents tend to underestimate 
their interactions with individuals as a function of 
recency . In our case, if a respondent did not interact 
with a friend recently, the tie may have been unreported. 

Oversampled tail - To control for the oversampled tail 
bias, we compute the threshold by finding largest ACx.y 
that produces a degree distribution with a maximum de- 
gree no larger than the maximum degree observed in the 
survey. This approach chooses ACx.y — 1900 and the 
tail of the inferred degree distribution agrees well with 
the survey data but less so near the head (see Figure [S]) . 
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Degree, k Clustering coefficient, C, Comoonent size, s 

FIG. 6. (Left) Degree distribution and mean clustering coefficient, (Ci) as a function of degree for both thresholds using the 
entire population of players. (Center) Binned clustering coefficient, d, plots for both thresholds using the entire population 
of players, bin width = 0.1. (Right) Distribution of component sizes. The undersampled tail network contains 1,194,032 
components. The oversampled tail network contains 991,932 components. 



B. Network structure 

These two thresholds represent reasonable bounds for 
what we expect for our interaction data as a whole. 
We now apply these two thresholds to the interactions 
among the full 17 million players and study the struc- 
ture of the induced social network. In the undersam- 
pled tail scenario {ACx,y — 197), the inferred network 
consists of 8,373,201 nodes and 31,051,991 edges, while 
the network inferred using the oversampled tail threshold 
{AC^.y = 1900), contains 4,732,405 nodes and 11,435,351 
edges. 

The top panel of Figure [Bl^Left) indicates that both 
cases we observe degree distributions with heavy tails, 
where the majority of nodes in the network are connected 
to a small number of neighbors while a small number 
of nodes are connected to a large number of neighbors. 
When corrmared to the social graph of Facebook dis- 
cussed in [3a |. players in Reach have smaller numbers 
of friends. The median friend count in Facebook is 99 
while in Reach it is roug hly 1/100*'* the size, 1 and 2 
at the over- and undersampled thresholds respectively. 
This large difference is likely caused by the high relative 
cost of establishing and maintaining a friendship in Reach 
versus the more cost-free nature of Facebook friendships. 
Specifically, Reach players must consistently and peri- 
odically interact over long periods of time, which is a 
significant investment of effort, while in Facebook, they 
must only click a request or accept button. 

A vertex's clustering coefficient is defined as 

^ number of connected neighbors 

number of possible connected neighbors 

and provides a principled way of measuring how close ver- 
tex i and its neighbors are to forming a clique [s^. This 
statistic equals unity when a vertex and its neighbors 
form a clique, while it equals zero when none of its neigh- 
bors are themselves pairwise connected. In our inferred 
graph, shown in the bottom panel of Figure [SlLeft), a 



substantial fraction of individuals (between 16-20%) form 
tightly knit groups with high values of Ci . 

Furthermore, the functional relation between the mean 
clustering coefficient {Ci) as a function of degree ki 
is roughly the same, regardless of which threshold we 
choose (Fig. |6l[Center)). For example, even when a ver- 
tex has a degree of 100, its clustering coefficient is likely 
to be between 0.1 and 0.2. This suggests that thresh- 
old choice does not substantially change the underlying 
network structure, and these numbers are close to those 
estimated for the Facebook social graph, where the mean 
clustering coefficient for a vertex with degree 100 was 
0.14 [sl]. While the mean clustering coefficient remains 
large independent of degree, a mild decreasing trend is 
evident. This suggests that nodes with high degree, who 
are likely high volume players, interact with others rel- 
atively less discriminately than nodes with smaller de- 
grees, a pattern also found in the analysis of the Facebook 
social graph (38j . 

Figure [6lJRight) plots the distribution of component 
sizes and indicates that the network contains a single 
large connected component composed of between two 
and four million players. The majority of the remaining 
nodes are spread amongst many components containing 
between roughly ten and twenty nodes. In the case of an 
undersampled tail, the network contains 1,194,032 com- 
ponents. In the oversampled case, the network contains 
991,932 components. 



VI. CONCLUSION 

Our motivating question was whether latent social ties 
like friendships can be accurately recovered from inter- 
action data alone, and indeed we have shown that they 
can, with remarkable accuracy. We demonstrated that 
periodicity between interactions and specific prosocial 
behaviors across these interactions are both highly ro- 
bust indicators of friendship, even in instances where 
data are sparse. Information theoretic measures of spa- 
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tial and temporal behavior, which are good indicators of 
the quantity of social ties in other contexts, are not ef- 
fective at predicting the ties alone, but may be useful in 
combination with other temporal features. There are a 
number of interesting points these results suggest, both 
for improving Reach and for enabling friendship-aware 
applications in other domains. 

Many online games, including Halo: Reach, rely on 
matchmaking algorithms to place individuals onto teams 
in order to make a new game instance go. If the Reach 
matchmaking algorithm works as desired, the teams are 
equally matched and the competition's outcome is unpre- 
dictable. However, when individuals play with friends, 
their performance improves [1], and this synergy is not 
included in the calculations of the matchmaking algo- 
rithm. A friendship-aware matchmaking algorithm, us- 
ing features like the ones we consider here, could correct 
for the effective increase in team skill that occurs when 
friends play together, without reference to an external 
"friends list", and thus produce better matched teams, 
more enjoyable gameplay and overall greater engagement 
by the users. Another improvement would be to suggest 
as friends (to be added to a user's friends list) those in- 
dividuals with whom a player has exhibited significant 
prosocial interactions, such as direct assists. 

In the more general context of an online system where 
we can observe interactions, but not labeled friendship 
ties, our results could be applied in an unsupervised man- 
ner. Using an unsupervised learning algorithm such as 
fc-means to separate friends from non-friends based on 
the autocorrelation values of their co-interaction time se- 
ries should be relatively simple and robust. The discrim- 
inatory power of autocorrelation and prosocial behavior, 
even with sparse data, suggests that latent friendship ties 
may in fact be easily detectable, due to the nature of 
friendship itself. In a sense, periodic and prosocial in- 
teractions are the definition of friendship, and it may be 



difficult to maintain such a relationship online without 
manifesting a signal in these ways. 

Friendship-aware applications are only one new oppor- 
tunity presented by the automatic inference of latent so- 
cial ties from interaction data. The ease with which we 
were able to recover the latent friendship labels raises 
significant privacy questions, as these labels are often 
considered private information. The accurate recovery 
of such private signals from public interaction data may 
facilitate malicious applications. The social consequences 
of large-scale deployment of friendship inference is diffi- 
cult to estimate. 

Other benefits are more easily identified. For instance, 
many questions in computational social science may ben- 
efit from the accurate recovery of the underlying social 
network that generates the observed data. The general 
outlines of our results may have productive applications 
in many of these domains, e.g., in big data analyses of 
online social behavior. Our results are encouraging for 
settings where ground-truth data are at best rare and ex- 
pensive to collect. Robust methods to extrapolate from 
ground-truth survey data to large-scale latent social net- 
work prediction are of great practical interest. We look 
forward to seeing the exploration of these and other ben- 
eficial applications. 



VII. ACKNOWLEDGEMENTS 

We thank Christopher Aicher and Nora Connor for 
insightful comments and valuable feedback, Chris Schenk 
for his help developing the data acquisition system and 
web survey, and Bungie Inc. for providing access to the 
data. We acknowledge financial support from the James 
S. McDonnell Foundation. 



[1] D. J. de Sofia Price, Science 149, 510 (1965). 

[2] D. Crandall, L. Backstrom, D. Cosley, S. Suri, D. Hutten- 

locher, and J. Kleinberg, Proc. Natl. Acad. Sci. (USA) 

107, 22436 (2010). 
[3] S. Wu, J. Hofman, W. Mason, and D. Watts, in Proc. 

20th Internal. Conf. on World Wide Web (ACM, 2011) 

pp. 705-714. 

[4] M. De Choudhury, W. Mason, J. Hofman, and D. Watts, 
in Proc. 19th Intemat. Conf. on World Wide Web (ACM, 
2010) pp. 301-310. 

[5] M. SzeU, R. Lambiotte, and S. Thurner, Proc. Natl. 
Acad. Sci. (USA) 107, 13636 (2010). 

[6] B. Keegan, M. Ahmed, D. Williams, J. Srivastava, and 
N. Contractor, in Intemat. Conf. on Social Computing 
(IEEE, 2010) pp. 201-208. 

[7] J. Blackburn, R. Simha, N. Kourtellis, X. Zuo, M. Ri- 
peanu, J. Skvoretz, and A. lamnitchi, in Proc. 21st In- 
temat. Conf. on World Wide Web (ACM, 2012) pp. 81- 
90. 



[8] W. Mason and A. Clauset, in Proc. Computer Supported 

Coop. Work and Social Comp. (2013). 
[9] J. Leskovec, D. Huttenlocher, and J. Kleinberg, in Proc. 

19th Intemat. Conf on World Wide Web (ACM, 2010) 

pp. 641-650. 

[10] J. Cranshaw, E. Toch, J. Hong, A. Kittur, and N. Sadeh, 
in Proc. 12th Intemat. Conf. on Ubi. Comp. (ACM, 
Copenhagen, Denmark, 2010). 

[11] A. Clauset and N. Eagle, in DIMACS Workshop on Com- 
putational Methods for Dynamic Interaction Networks 
(2007) pp. 1-5. 

[12] N. Eagle, A. Pentland, and D. Lazer, Proc. Natl. Acad. 

Sci. (USA) 106, 15274 (2009). 
[13] K. Shim, R. Sharan, and J. Srivastava, Advances in 

Know. Discovery and Data Mining , 71 (2010). 
[14] K. Shim and J. Srivastava, in Intemat. Conf. on Social 

Computing (IEEE, 2010) pp. 128-136. 
[15] K. J. Shim, S. Damania, C. DeLong, and J. Srivastava, 

IEEE Potentials 30 (2011). 



12 



[16] K. Shim, K. Hsu, S. Damania, C. DeLong, and J. Sri- 

vastava, in Internal. Conf. on Social Computing (IEEE, 

2011) pp. 617-620. 
[17] D. Huffaker, J. Wang, J. Treem, M. Ahmad, L. FuUerton, 

D. Wilhams, M. Poole, and N. Contractor, in Internat. 

Conf. on Comp. Science and Engineering, Vol. 4 (IEEE, 

2009) pp. 326-331. 
[18] Y. Huang, C. Shen, D. Williams, and N. Contractor, 

in Internat. Conf. on Comp. Science and Engineering, 

Vol. 4 (IEEE, 2009) pp. 354-359. 
[19] Y. Huang, M. Zhu, J. Wang, N. Pathak, C. Shen, B. Kee- 

gan, D. Williams, and N. Contractor, in Internat. Conf. 

on Comp. Science and Engineering (IEEE, 2009) pp. 

122-127. 

[20] E. Castronova, D. Williams, C. Shen, R. Ratan, L. Xiong, 
Y. Huang, and B. Keegan, New Media & Society 11, 685 
(2009). 

[21] E. Bakshy, M. Simmons, D. Huffaker, C. Teng, and 
L. Adamic, Proc. Fourth Internat. AAAI Conf. on We- 
blogs and Social Media 1001, 48103 (2010). 

[22] M. Ahmad, B. Keegan, J. Srivastava, D. Williams, and 
N. Contractor, in Internat. Conf. on Comp. Science and 
Engineering (IEEE, 2009) pp. 340-345. 

[23] D. Liben-Nowell and J. Kleinberg, Journal of the Amer- 
ican Society for Info. Science and Technology 58, 1019 

(2007) . 

[24] A. Clauset, C. Moore, and M. Newman, Nature 453, 98 

(2008) . 

[25] P. Sarkar, D. Chakrabarti, and M. Jordan, in Proc. 29th 
Internat. Conf. on Machine Learning (2012). 

[26] L. Adamic and E. Adar, Social Networks 25, 211 (2003). 

[27] X. Li, L. Guo, and Y. Zhao, in Proc. 17th Internat. Conf. 
on World Wide Web (ACM, 2008) pp. 675-684. 



[28] J. Jones, J. Settle, R. Bond, C. Fariss, C. Marlow, and 
J. Fowler, PLCS ONE 8, e52168 (2013). 

[29] R. Herbrich, T. Minka, and T. Graepel, Advances in 
Neural Info. Proc. Systems 19, 569 (2007). 

[30] The API was active from September 2010 through 
November 2012. API documentation was taken offline in 
September 2012. 

[31] In the survey a friend is defined as a person known by 
the respondent at least casually, either offline or online. 

[32] C. Bishop, Pattern Recognition and Machine Learning, 
Vol. 4 (Springer New York, 2006). 

[33] N. Hanaki, A. Peterhansl, P. Dodds, and D. Watts, Man- 
agement Science 53, 1036 (2007). 

[34] To aid interpretation of the tree results, we normalize 
feature values by the average observed values taken from 
a uniform random sample of roughly 1 million players. 
For each of the players in the random sample we compute 
feature values for each player they interacted with in the 
data. . 

[35] A. Bradley, Pattern Recognition 30, 1145 (1997). 

[36] Note that while the classification tree only classifies 
friends and non-friends, the numbers observed, shown in 
the leaves of Figure (2] indicate the maximum likelihood 
estimates of friendship probability at the leaf. 

[37] The autocorrelation function can be computed in 
0(n log n) time using a fast Fourier transform. 

[38] J. Ugander, B. Karrer, L. Backstrom, and C. Marlow, 
arXiv preprint arXiv:1111.4503 (2011). 

[39] M. Newman, Networks: An Introduction (Oxford Uni- 
versity Press, 2010). 



